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FIELD OF INVENTION 
This invention relates to computer database technology 
applied to genetic data and corresponding cell information. 
More specifically, a relational database system that stores 
DNA sequences, the corresponding source data, and other 
related scientific data is disclosed. 

BACKGROUND OF THE INVENTION 
Relational databases are generally known in the art. See 
for example C.J. Date, "An Introduction To Database Systems" 
Addi son-Wesley Publishing Company, 1982 (particularly, Part 
2) . 

In general, a relational database can be characterized as 
a system for storing data represented as a plurality of 
tables. A row of each table, also referred to as a tuple, 

15 represents a record of information. A column is essentially a 
collection of values for the same field of the stored records. 
Each column is also referred to as an attribute of the stored 
records. In other words, each record in a given table of a 
relational database includes a set of fields that correspond 

20 to the attributes of the table. A set of all the values from ' 
which the actual values of an attribute can be drawn is 
referred to as a domain. As discussed on page 65 of the 
above-referenced text, "a crucial feature of relational data 
structure is that associations between tuples (rows) are 

25 represented solely by data values in columns drawn from a 
common domain." 

Previously most of the analysis of genetic information 
has been done using chemical methods in a laboratory. 
Computerized research tools have been limited essentially to 
performing comparisons of sequence information to determine 
whether a particular genetic sequence has been previously 
identified. Such tools may provide effective searching 
techniques for genetic sequences; however, they do not store 
and manipulat diverse scientific information, such as the 
correlation between the cDNA sequences and the types of cells 
from which they were derived. Thus, the existing computerized 
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specific preferred embodiment in conjunction with the 
accompanying drawings in which: 

Fig. l symbolically depicts an overall architecture of 
the system of the preferred embodiment of the present 
5 invention. 

Fig. 2 is a flowchart symbolically depicting the process 
of cloning and sequencing cDNAs. 

Figs. 3A, 3B and 4-10 illustrate portions of the 
biological relational database of the preferred embodiment of 
10 the present invention. 

Fig. li illustrates an example of the output of an 
abundance analysis query of the relational database of the 
preferred embodiment. 

Fig. 12 illustrates an example of the output of a 
subtraction analysis query using the database of the 
preferred embodiment. 

DETAILED DE SCRIPTION OF THE PPPF ERRED RMRnniM^ 
According to the preferred embodiment of the present 
invention, a system for storing, tracking and manipulating 
the genetic data is organized as a relational database As 
illustrated in Fig. i, the users of the system at their 
workstations (6 and 7) can access one or more relational 
databases via an integrated Ethernet network 5 . The 
workstations (6, 7) are typically personal computers known in 
the art that usually include data entry means, output 
devices, display, CPU, memory (RAM and ROM) and interfaces to 
network 5. Database storage 1 illustrates the database of 
the preferred embodiment of the present invention, which is 
stored at a file server connected to network 5. As 
illustrated, it is supported by computer 2, which, as known 
xn the art, usually includes CPU 4. data storage means 8 
interfaces to the network 9, and input and output devices 
(not shown) . Reference databases 3 illustrate sources of 
data which, for example, may be searched as part of the use 
of database 1. Such databases may, for example, include 
other sequence, nucleic acid, protein, and motif databases 
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It is well known that each cell in an organism such as 
the human body, contains a complete set of genes or genetic 
information. These genes are either active or inactive at 
different times in the cell's life cycle. Some genes ar 
active in all cells and are necessary for normal and common 
functions, or housekeeping duties. Other genes are only 
active in a particular cell type, because they specify and 
regulate functions peculiar to a tissue or an organ under 
normal conditions. Finally, there are genes which are 
activated only in response to stress or disease. Some stress 
genes, which activate in several cell types, respond to the 
general alarm. Other stress genes are very specific and only 
activate in a particular cell type. Thus genes can be grouped 
into very small and specific subsets or subsets of varying, 
15 larger sizes. The classification and understanding of these 
nested sets of genes are important in the diagnosis and 
treatment of disease. 

Genes, or double-stranded deoxyribonucleic acid (DNA) , 
are activated by the transcription or copying of the sense' 
strand of the DNA molecule into single-stranded messenger 
ribonucleic acid (mRNA) . The message inherent in the mRNA 
sequence is subsequently translated into amino acids, the 
molecular building blocks of the polypeptides or proteins that 
function structurally or enzymatically in the cell. 

The activities taking place at any one time and the 
relative importance of those activities are reflected in the 
numbers of mRNA molecules found in the cell. some mRNAs 
(housekeeping) are always present, and their numbers remain 
faxrly stable in normal cells of any tissue. These mRNAs (eg. 
actxn) represent and carry out the constant background 
activity essential to most cell types (the exception to this 
case xm a mature, differentiated red blood cell which lacks 
DNA but has a set of mRNAs or enzymes which function for the 
remainder of its life). m contrast, the RNAs (routine) which 
carry out the duties of a particular cell type are only 
activated in that cell type, and the numbers of routine mRNAs 
will be stable under normal conditions. if that particular 
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cell type is stressed or exposed to disease, the numbers of 
routine mRNAs fluctuate as genes which respond to the 
stress/disease are activated. These stress/disease mRNAs have 
priority over other routine or housekeeping mRNAs, and they 
quickly increase in number. 

For example, the housekeeping genes of brain cells and 
liver cells are shared; cells from both organs transcribe the 
mRNAs that produce the enzymes required to process incoming 
molecules of glucose. However, the mRNAs that make proteins 
for the normal functions of a pituitary cell are different 
from the mRNAs of a liver Kupffer cell although each is 
functioning normally. Likewise, the set of mRNAs from a 
diseased liver cell differ from those from a normal liver 
cell. In each case, a different and diverse subset of mRNAs 
15 characterizes the cell in a particular situation at a 
particular time. 

The database of the preferred embodiment provides the 
storage, manipulation, and retrieval of the information which 
relates to the classification and characterization of unique 
20 populations of mRNAs. On the basis of this information, 
scientists can diagnose diseases and design specific 
treatments. The wealth of detailed information provides clues 
to earlier diagnosis and treatment which contribute to rapid 
healing and help avoid permanent impairment or death. 
25 The database system of the present invention takes 

advantage of the powerful capabilities of modern computers by 
storing genetic information in association with a large amount 
of related information. More specifically, in the preferred 
embodiment, the information on essentially all the steps of 
30 obtaining tissue, extracting transcripts, cloning, and 

identifying cDNA sequences is stored in various relational 
tables. Thus, the database of the present invention allows 
one to backtrack through the steps performed in the laboratory 
in identifying the cDNA sequence. The diverse data stored in 
35 the system of the present invention will in many instances 
answer questions frequently asked in molecular biology and 
pharmacology without requiring actual experiments, such as: 



BNSDOCID: <WO 9623078A1_I_> 



WO 96/23078 

PCT/US95/12429 

6 

Pi„ , car 9 ets for pharmaceutical intervention"' 

10 10. m Fig. 2 the first a J 3B and 4 through 

Preparation 10 includes the steps of olj *' ^ 

the cells so obtaining and growing 

1S so as to Prepare them for RNA ext-r- a ^t-.i 
following stm ?n extraction. The 

received from an outside source or coli a hn«. 
obtained, it is cloned at steo 40 ™* . 

ThS sequence that is obtained a ° s t" so'TtT " ^ 
20 step S0 to known sequences on th» compared at 

function of the ™ A sequ e nc TiTlll^T*'- Pinally " 
Figs. 3A 3B a „„ „ ,„ ™ determined at step 70. 

taoles of the'data^r \ sche -"ca!l y illustrate the 

or cne database of the preferred em w^. 
Exemplary fields ,or attributes) arlT • 
25 box, and each table includes « a , ""^ MCh 

which is co-on to at lelst havi "9 • —in 

L - IJ -Least one othpr f. a ui^ ^ 

consider the table indicated as 130 «Bioloa' , SXamPle ' 
the table indicated as i 40 "Cell Cui, , * S ° UrCe " and 

these two tables th. . Cell Culture/Treatment". m 

caoj.es the common domain i«s h-!« ~~ 
30 notice that Arrow i « oio_source_l D . Also, 

Che other e^lT^™™ - and 

-pie in the Biolo g ical SouLe L e t^re " ^ ™ 
°- tuple i n th6 Cell Culture/Treatment ta b" 

of the present inventiol ^™ P ^" B ° f the daEa — 

includes information relatinfto ^ t ™' 

cells relating to the biological source of the 



SUBSTITUTE SHEET (RULE 26) 

JDOCID: <WO 962307aA!_l_> 



10 



WO 96/23078 PCT/US9S/12429 

7 

used to obtain the cDNA (boxes 130, 110, 120) , cell culture 
and treatment (boxes 140, 180) , mRNA preparation (box 150) 
and cDNA construction (boxes 170, 160) . More specifically 
box 130 depicts the table for storing the biological source 
information. The source may be cells grown in tissue culture 
or cells obtained during surgery from a single individual or 
a pooled sample, e.g., pituitary glands obtained from 
patients of both sexes and a range of ages. In the preferred 
embodiment the biological source table 130 contains 
attributes as depicted in Fig. 3A, such as tissue, organ, 
gender, age, pathology, etc. The biological source may 
reflect a normal, treated or diseased state. A person 
skilled in the art will realize that, if desirable, certain 
other biological source information can be stored; and on the 
15 basis of this disclosure, such person will be able to include 
other relevant attributes if desired. 

The data regarding the collaborators, i.e contributors 
of a biological source, is stored in table 110 as depicted in 
Fig. 3A, and the information regarding the cell suppliers 
20 contributing to biological sources is stored in table 120. 
The source_ID attribute of the biological source table 130 
corresponds to either collaborator_ID or supplier_ID of 
tables 110 and 120 respectively. 

Part of the cell preparation procedure includes the cell 
25 culture and treatment process. Cell culture is carried out 
in containers of known size or volume. Density is usually 
reported as cells per milliliter (of liquid media) and is 
monitored to maintain a healthy cell culture. Density at the 
time cells are harvested may be measured either as cell 
30 number or as grams per liter. Treatment may vary. Induction 
with a chemical can change a cell from an immature form, 
monocyte, to a mature one, macrophage. Stimulation or 
activation with a different chemical causes the macrophage to 
ingest and digest invading bacteria. 

In some cases, a cell culture is split into two or more 
parts, with one subsample maintained in its normal growth 
mode (as the biological control) and other subsample (s) 
subjected 

SUBSTITUTE SHEET (RULE 26) 
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with which mRNA_source_ID correlates. These two attributes 
in combination, therefore, link records of table 150 to 
tables 140 and table 130. 

Next, as shown in step 30 of Fig. 2, a cDNA sequence is 
5 derived from the mRNA . The cDNA construction requires the 
conversion of mRNA into complementary DNA (cDNA) preferably- 
using oligo DT, random priming, reverse transcription or 
other protocols, as known in the art. Useful cloning sites 
are designed into the bacteriophage into which the DNA is 
10 packaged or incorporated. Packaging or plating efficiency is 
determined by examining the number of primary plaques, i.e., 
individual bacterial colonies, which resulted from a 
particular experiment. Information is recorded about the 
genetic background of host bacterium and the titer of the 
15 bacteriophage, before and after amplification. The quality 
of the library is determined by screening for the act in gene, 
present in all normal or diseased cell types, and estimation 
of the size of the cDNA fragment which has been inserted 
(insert size) . 

20 The data related to the cDNA construction is stored in 

table 170 in Fig. 3B. As apparent to a person skilled in the 
art, the attributes of this table depicted in Fig. 3B provide 
detailed information about the cDNA construction. Note that 
tables 170 and 150 have a common attribute mRNA_prep ID. 

25 Preprocessed cDNA fragments can be p urchased from an —~ 

outside supplier or jabtained fromac ollaborator or customer . - 
In such a case, the relevant data is stored in the cDNA 
supplier table 160 is stored in the database. The Table 160 
has the attribute supplier_ID which is also a part of the 

30 cDNA construction table 170. 

As depicted in Fig. 2, after the cDNA has been 
construct ed, the cloning process, is performed . The portion 
of the database depicted in Fig". 4 relates to "the clone 
preparation data that is obtained during the cloning process 

35 and includes information relating to excision (box 190) , 
inoculation (box 200) , preparation (box 210) , fluorometer 
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associated with the fluorometer table 23 0 via a common 
calibration_ID attribute* 

After fluorometry analysis, the cDNAs are prepared for 
sequencing. Preparation of the cDNAs for sequencing is 
5 recorded along with the methods (and their modifications) used 
at that time. The scientists (SWAT) troubleshoot the 
sequencing process and track the results of their custom 
protocols. The preparation table is illustrated as 210. 

Table 250, clone log, combines the information regarding 
10 the cloning process as illustrated in Fig. 4. In particular, 
it contains an attribute Inoculation_ID which is also an 
attribute of the inoculation table 200. An attribute clone_lD 
is shared with the fluorometer log table 220. An attribute 
Preparation_ID is also a part of the preparation table 210. 
15 The dead_or_alive attribute of the clone log table 250, for / 
example, identifies dead clones in which the plasmid 
preparation did not yield enough DNA to sequence. 

The data related to the process of sequencing is stored 
as depicted in the sequencing portion of the database 
20 illustrated in Fig. 5. This portion includes information 
relating to specifications of the sequence and related 
information. It includes the sequencing log (box 300) the 
sequencing gel (box 280) , the reaction set (box 270) and the ^ 
sequence archive (box 290) . The specification of the sequence J> 
25 and related information are stored as attributes in sequencing 
log table 300. It should be noted that a clone can be 
sequenced multiple times. Table 260 (sequencing link) links mi 
the clone log table 250 with the sequencing log table 300. 
The sequencing link table 260 contains a clone_ID attribute, 
30 which is in common with the same attribute in the clone log 
table 250 and a sequencing_log_ID attribute which is also 
included in the table 300. 

Sequencing of the cDNAs is performed on an automated ABI 
system. The sequencing gel is evaluated for the sharpness and 
35 darkness of the signal which each of the deoxyribonucleotides 
or bases (adenine, cytosine, guanine, and thymidine) display, 
their physical proximity to one another in the gel, and the 
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nature of the problem, staff involved in maintenance and 
pertinent comments. 

In a preferred embodiment, the Catalyst and Computer 
Maintenance Logs tables (905 and 910 respectively) are linked 
5 through the computer_ID attribute and include similar 

information to that of the Sequencer Maintenance Log and can 
be related to essentially any DNA sequence. 

The Equipment Log table 915 connects with Maintenance 
tables 900-910 via the instrument_number and computer ID 
10 attributes and has information on the equipment or instruments 
used in the sequencing operation. In a preferred embodiment, 
table 915 stores information regarding equipment name and 
serial number, vendor identifier, and date installed. 

A separate vendor table 920 connects with the Equipment 
15 Log Table 915 via the vendor_identif ier attribute, and stores, 
for example, the company name, address, phone number, fax 
number and contact person. The vendor listing can also have 
additional information on the vendor, including E-mail address 
and date contract signed. 
20 Fig. 7 illustrates a portion of the database of the 

preferred embodiment for storing information regarding the 
sequencing reagents. The Gel Link table 925 links to the Gel 
Key table 280 via the gel_key_attribute and to the gel 
solution table 935 via the gel_solution_ID attribute. 
25 The Gel Solution table 93 5 includes information on the 

gel solution and further includes the date the solution was 
made and who prepared the solution. The Gel Solution-lot Link 
table 950 links to the gel solution table 935 via the 
gel_solution_lD attribute and also includes lot_number, and 
30 reagent_lD attributes which are shared with the Lot table 965. 
The Reaction-Cocktail Link table 93 0 shares the 
reaction_set__ID attributes with the reaction set table 270. 
The Reaction-cocktail Link table 930 shares cocktail_iD with 
the Cocktail table 940. The cocktail table 940 also has the 
35 date the cocktail was made and staff person who made the 

cocktail. The Cocktail-Lot link table 955 has the cocktail ID 
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attribute in common with the Cocktail table 940 and the Lot- 
number and Reagent_id in common with the Lot table 965. 

The Lot table 965 includes reagent ID and lot number, 
vendor identifier, date received and date used. The vendor ID 
attribute is shared with the Vendor table 960. A separate ~ 
reagent table 970 shares the Reagent_lD attribute with the L t 
table 965 and also has an expanded reagent name. 

Experimental sets of sequences may be stored in the 
database in the express sets portion shown in Fig. 8 . This 
portion includes an express link table 370, a clone variant 
table_380, an experimental table 390, a clean ^Tia^IT^o" and 
a resequencing table 410. Express Link table 370 stores 
sequence sets which have higher priority. They are given 
unique identifiers and handled separately from the batch 
process materials. clone Variant table 380 refers to variant 
sequences flagged by an individual investigator. The variants 
are evaluated by that scientist, collaborator, or customer and 
appropriate action is taken. The experimental sequences 
stored in Experimental table 390 are similar to the variants 
above. They may be homologous, allelic or mutant sequences 
whxch have been flagged by a particular scientist. if only a 
fragment has been recovered, a full length expression sequence 
is ordered, and investigation continued. Cleanup tab! 400 
stores data reflecting the addition of extra steps to the 
protocol. The longer procedure is designed to improve 
readability of the sequence. Resequencing is simply repeating 
the procedure in order to check a sequence or to obtain more 
data. information regarding resequencing is stored in 
Resequencing table 410. 

Express Link table 370 contains a clone_iD attribute 
whxch is also included in the Clone Log table 250. Attribute 
log_entity_lD of the table 370 provides a correlation with 
varxant_iD, experimental_set_lD, cleanUp_set_iD, and 
resequencing_set_i D of the tables 380, 390, 400, 410 
respectively. Log_table_name attribute of the table 370 
identifies the table correlated by the Log_entity id. 
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As illustrated in step 60 of Fig. 2, each cDNA sequence 
that has been obtained in step 50 is then compared to th 
known sequences in the genetic databases to identify it if 
possible. This process involves comparing sequences (a) 
within a data set, (b) within the internal database and/or (c) 
with external databases. Since the library represents the 
frequency with which an RNA transcript appears within a 
certain source tissue, several different clones may contain 
all or parts of the same gene or its allele (s) . The computer 
also analyzes insert size by counting individual nucleotides 
in the sequence. 

Data relating to sequence comparison is stored in tables 
on the sequence comparison portion of the database shown in 
Fig. 7. These tables include a first sequence match log table 
510 and a second sequence match log table 515. 

The database of the present invention may also access 
external databases. Genetic databases may have DNA or protein 
sequences. Such databases services may also provide searching 
or matching tools in addition to named DNAs, proteins or 
fragments thereof. As illustrated in Fig. 7, such outside 
databases include the GenBank database (box 610) , the ProDom 
database (box 570) , the Blocks database (box 580) , the 
Pisearch database (box 590) and the Sites database (box 600) . 

The Genbank database is used as a primary source of known 
genes, sequences and other information against which the 
sequencing stored in the database are compared. Percent 
identity and probability are both considered to determine 
whether such fragments may be categorized as "exact" 
(apparently identical to a known/named human sequence) , or 
homologous (partially related) to a gene identified in humans 
or another species. Unique and unidentified fragments or 
sequences are listed by an identifier. 

ProDom, Blocks, and Pisearch databases may be accessed in 
order to determine if a particular sequence contains 
functional protein domains or motifs. The patterns may 
provide important structural information for a peptide or 
protein encoded by the sequenc . 
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In addition, Vectors database 520 stores the DNA 
sequences of the vectors used to clone the cDNAs. By 
comparing the identified cDNA sequences to the sequences in 
this database, vector sequences or stretches of vector 
5 sequences that show up in a cDNA sequence can be delimited 

Similarly, Repeats database 530 allows repeats which belong to 
a multigene family, such as alu, to be identified. Hidden 
Markov database 560 contains software which looks at a 
nucleotide sequence alignment and computes a predicted peptide 
10 structure from that sequence. As shown in Box 550 of Fig 9 
other databases which provide additional features can also be 
accessed. 

/ When a sequence comparison results in a match the 

information regarding that match is stored in Sequence Match 
Log tables 510 and 515. This information generally includes 
address information for the matching sequence record in the 
external database as well as scores which represent the 
quality of the match. in an alternative embodiment it may be 
preferable to store the scores in a separate record, since th 
scorxng methods are not identical for all databases. Sequence 
Match Log 510 is linked to sequence archive 290 by the 
attribute sequence_ID which they share. it should be noted 
that fxrst Sequence_Match_Log 510 contains better matches, 
vhxle marginal matches are stored in the second sequence Match 
Log 515. Both tables (510 and 515, have identical attributes 

Functxon identification, illustrated as step 70 in Fig 
2, is then performed on matches whose quality is above a 
specific threshold. The data related to function 

30 !^ enti ! i r tion is stored in the tables as shown in "g- 10. 

30 These tables include a protein table 720, a protein-sequence 
Uxnk table 730, a folder table 760 and location table 780. 
Protein identification may come from any of the 
function/domain databases. The Genbank location or locus and 

35 1 21 IT^T 10 ^ 1 EC nUmbSr (Gn2yme ° r Pr ° tein ^ssification, 
35 , e stored xn table 720. Each entry in this ^ corresp J s 

■ to one or more sequences from the sequence archive table which 
was conclusively identified with respect to its function 
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Protein table 720 is linked to Sequence Archive table 290 via 
Protein-Sequence Link table 730. Protein tabl 720 has the 
attribute protein_ID in common with Protein-Sequence Link 
table 730; and Sequence Archive table 290 has the attribute 
5 sequence_ID in common with Protein-Sequence Link table 730. 
Each entry in folder table 760 contains unstructured 
annotations for one or more sequences from the archive table 
which had interesting but inconclusive matches with the other 
databases. Any type of annotation, footnote, or remark can be 
10 recorded in the folder table 760. This permits the researcher 
to store desired information without contaminating other 
records in the database with information from inconclusive 
matches . 

Folder table 760 is linked to sequence archive 290 via 
15 function sequence link 750. Function sequence link 750 has an 
attribute Folder_lD in common with folder table 760 and an 
attribute Sequence_ID in common with sequence archive 290. 

The present invention permits a researcher to search the 
relational database using keywords and to specify the table (s) 
20 in which the keyword search should be performed. Thus, for 
example, a researcher could query the database for all 
occurrences of the word "endothelial" in the Biological Source 
Table 130. 

In addition, the present invention allows the researcher 
25 to store queries in Keywords table 790 shown in Fig. 10. Each 
query stored in this table is identified by a unique 
Keyword_ID. When a researcher wishes to run a particular 
stored query, he or she simply enters the keyword_ID for the 
query. The computer then pulls up the associated record, and 
30 searches the table (s) identified in the Table_name field for 
the keyword (s) stored in the Keyword_text field. The results 
of the search can be delivered to the user for example via 
E-mail notification as shown in boxes 800-820 of Fig. 10. 
Location table 780 stores information regarding the 
35 location within the cell of each identified sequence. 

Location table 780 is linked to Protein table 720 by common 
attribute Protein_ID, and stores the location information in 
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Similarly, the database structure describ d above 
provides a convenient way to implement subtraction analysis. 
Subtraction analysis determines which sequences are expressed 
more commonly in an activated cell compared to a normal cell. 
To perform subtraction analysis, abundance analysis is 
performed for the normal cell library and the activated cell 
library, and when the information is obtained, a ratio of the 
values is determined. Fig. 12 exemplifies the output of such 
an operation for normal versus LPS activated THP-i. 

Location analysis can also be performed. Here, the user 
requests, for example, the location of a specific protein 
within a particular activated macrophage. The computer 
identifies the subset of records associated with the desired 
cell in the manner described above, consults the associated 
records in Protein table 720 to verify that the protein is 
present in the cell, and finally looks up the location of the 
protein in Location table 780 and outputs the location to the 
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The sequence location table categories in the preferred 
embodiment are nuclear, cytoplasmic, cell surface or secreted. 
Wxthin the cytoplasm, sequences may be assigned to 
cytoskeleton, intracellular membranes, or mitochondria. This 
information is provided in the location field of Location 
table 780. All of the unidentified sequences, regardless of 
thexr relative abundance, are by default relegated to the 
unknown category. 

Yet another function supported by the database of the 
preferred embodiment is distribution. This function 
determines in which tissues or organs for example a given 
sequence is found and how frequently. The system steps 
through the records in the Sequencing Log 300 and when there 
18 3 matCh With the <*esired sequence the system determines the 
organ and tissue where the specified sequence was found 
through the relational association of the database. After all 
the sequences have been examined, an output is prepared 
representing the requested distribution statistics. 
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CLAIMS 

1* A computerized storage and retrieval system of 
biological information comprising data entry means, display 
means, central processing unit, and data storage means for 
5 storing data in a relational data base wherein the database 
comprises tables, each table having a domain of at least one 
attribute in common with at least one other table, said tables 
comprising: 

a plurality of tables for storing library preparation 

10 data; 

a plurality of tables for storing clone preparation data; 
a plurality of tables for storing sequencing data; and 
at least one table for storing sequence comparison data. 

2. The database of the system of claim 1 further 
15 comprising at least one table for storing functional 

identification data. 

3. The database of the system of claim 1 further 
comprising tables for storing express sets. 

4 . The database of the system of claim 1 wherein the 

2 0 tables for storing library preparation data comprise a table 

for storing mRNA preparation data. 

5. The database of the system of claim 1 wherein the 
tables for storing library preparation data comprise a table 
for storing cDNA construction data. 

25 6. The database of the system of claim 1 wherein the 

tables for storing library preparation data comprise a table 
for storing biological source data. 

7. The database of the system of claim 1 wherein the 
tables for storing library preparation data comprise a table 

3 0 for storing cell culture and treatment data. 

8. The database of the system of claim 1 wherein the 
tables for storing clone preparation data comprise a table for 
storing inoculation data. 

9. The database of the system of claim 8 wherein the 

35 tables for storing clone preparation data comprise a table for 
storing excision data. 
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a relational database for storing said biological data, 
said database comprising a plurality of tables each of said 
tables having at least one attribute having a common domain 
with an attribute of at least one other table of the database; 
5 and 

means for determining on the basis of the data stored in 
the database the location of an mRNA within a given cell. 

20. A computer system for storing and retrieving 
biological data comprising: 
10 a database comprising tables wherein said biological 

information is stored such that the tables are interrelated by 
having at least one common attribute; 

means for determining a presence and frequency of a 
specific RNA in each of a plurality of organs. 

15 
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